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Abstract 


Recent  results  on  coding  capacity  and  information  capacity  for  the 
mismatched  Gaussian  channel  are  discussed.  Sufficient  conditions  for  causal 
feedback  to  increase  information  capacity  are  given  for  the  finite-dimensional 
discrete-time  Gaussian  channel. 


fQVK/V 


Introduction 


The  capacity  (in  the  Shannon  sense)  of  a  communications  channel  is 
usually  defined  by  either  of  two  principal  prescriptions.  Information  capacity 
is  the  supremum  of  the  average  mutual  information  between  an  input  stochastic 
process  (signal)  and  the  noise-perturbed  output  process,  with  the  supremum 
taken  over  an  appropriate  class  of  admissible  input  processes.  The  second 
definition  is  that  of  the  supremum  of  all  possible  transmission  rates,  where 
the  transmitted  code  words  are  subject  to  a  constraint.  For  example,  in  the 


time-discrete  additive  channel,  define  the  number  of  distinct  code  words  tran¬ 


smitted  by  time  t  as  [e  ],  where  [x]  is  the  integer  part  of  x,  and  R  is  the 


"rate."  If  R  is  fixed  and  the  maximum  probability  of  decoding  error  goes  to 
zero  as  n  -»  00  along  some  subsequence,  then  R  is  said  to  be  an  admissible  rate 
(for  the  channel  and  the  constraints).  The  (deterministic)  coding  capacity  is 
then  the  supremum  over  all  admissible  rates.  One  cam  also  consider  random 
coding  and  other  capacities  connected  with  coding;  only  deterministic  coding 
capacity  will  be  considered  here. 

The  additive  Gaussian  channel  is  a  channel  of  primary  practical  impor¬ 
tance.  The  received  waveform  is  the  sum  of  the  transmitted  waveform  and  a 
sample  function  from  a  Gaussian  process:  Y  =  X  +  N.  where  N  is  noise,  X  is 
signal.  If  the  channel  is  without  feedback,  and  X  is  a  sample  function  from  a 
stochastic  process,  then  N  is  usually  independent  of  X.  With  feedback,  X  will 
be  a  function  of  the  past  values  of  Y,  and  will  thus  depend  upon  N. 

In  this  paper,  a  general  discussion  is  first  given  of  some  recent  results 
on  information  capacity  and  coding  capacity  of  additive  Gaussian  channels  when 


the  constraint  is  mismatched  to  the  channel  noise;  that  is.  the  constraint  is 


given  in  terms  of  a  covariance  that  is  different  from  that  of  the  noise 
covariance.  Such  a  "mismatched  channel"  is  the  usual  case  in  practice,  since 
one  will  rarely  know  the  exact  covariance  of  the  noise.  Moreover,  in  some 
situations,  such  as  jamming  channels,  the  mismatch  occurs  as  an  essential  part 
of  the  problem  formulation.  The  results  given  here  on  information  capacity 
without  feedback  appear  in  [3];  the  results  on  coding  capacity  will  appear  in 
[5].  A  second  set  of  new  results  summarized  here  consists  of  sufficient  condi¬ 
tions  for  causal  feedback  to  increase  information  capacity  [2],  A  statement 
and  proof  are  given  for  the  finite-dimensional  time-discrete  channel. 
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The  channels  considered  here  can  be  nonstationary  and  can  have  memory. 
Thus,  in  the  discrete-time  case,  it  is  not  required  that  the  noise  covariance 
matrix  be  a  diagonal  matrix. 

Information  Capacity  and  Coding  Capacity  of  Gaussian  Channels  Without  Feedback 

For  information  capacity  of  Gaussian  channels  without  feedback,  solutions 
are  given  in  [1]  and  [3].  The  framework  there  is  for  stochastic  processes 
inducing  measures  on  Hilbert  space.  These  results  can  be  extended  to  measures 
induced  on  a  class  of  linear  topological  spaces;  see  [9]  and  [4], 

Consider  now  the  additive  time-discrete  Gaussian  channel  without  feed¬ 
back,  with  processes  involved  having  sample  paths  in  Let  X  denote  an  input 

stochastic  process,  Y  =  X  +  N  as  above,  and  I[X,  Y]  the  mutual  information 
between  X  and  Y  (see,  e.g..  [1]  for  basic  definitions).  Let  R^  denote  the 

covariance  operator  of  the  noise,  and  let  denote  another  covariance  opera- 

2 

tor.  Define  the  constraint  on  X  by  £11X11^  <,  P,  where  £(•)  denotes  expectation 
with  respect  to  the  probability  on  6^  defined  by  X.  and  II - 11^  is  the  repro¬ 
ducing  kernel  Hilbert  space  (RKHS)  norm  for  R^:  llyll^  =  IIR^2yll^  (11*11  the  B^ 
norm);  one  can  assume  WLOG  that  lC*  exists.  If 

I(X,  X+N)  over  all  such  admissible  X  processes  is  equal  to  P/2. 

For  the  same  channel,  with  deterministic  coding  used,  for  each  n  >  1, 

constrain  each  code  word  x  to  belong  to  IRn  and  to  satisfy  llxll^  <  nP,  where 

w.n 

IMIyy  is  the  RKHS  norm  of  R^,  with  the  nxn  matrix  given  by 

R^(ij)  =  R^(ij).  i . j  <  n.  A  code  {k.n.e^}  is  then  a  set  of  k  code  words,  each 

obeying  the  constraint,  with  maximum  probability  of  decoding  error  being  <  e^. 

A  real  number  R  >  0  is  then  sin  admissible  rate  if  there  exists  a  sequence 

( { [e^] , n ,  &n} )  of  codes  such  that  e.^  -»  0  as  n  -»  00  along  some  subsequence.  The 

supremum  of  all  admissible  rates  is  the  coding  capacity,  denoted  here  by 

CJ(P).  ^  ^  =  Rn.  then  CJJ(P)  =  i  log[l+P]. 

Those  familiar  with  the  Shannon  theory  will  recognize  the  similarity  of 
the  above  results  to  those  obtained  for  the  classical  white  noise  channel  with 


R^  =  Rj^,  then  the  supremum  of 


a  pure  power  constraint  [6].  However,  this  similarity  disappears  when  one 


examines  the  "mismatched”  channel:  ^  R^.  The  expression  for  the  information 


capacity  then  takes  one  of  several  forms,  depending  on  the  relationship 


between  R^  and  R^.  For  finite  information  capacity,  one  must  have 


R^  =  R^(I+S)R^,  where  I  is  the  identity  in  6^  S  is  a  self-adjoint  operator 


in  ^  such  that  (I+S)  exists  and  is  bounded  [3].  The  information  capacity 


then  depends  on  the  spectrum  of  S;  specifically,  on  the  smallest  limit  point 
of  the  spectrum,  denoted  by  0,  and  those  eigenvalues  (if  such  exist)  of  S  that 
are  strictly  less  than  9.  See  [3]  or  [4]  for  the  various  expressions.  These 
expressions  are  considerably  more  complicated  than  that  for  the  matched 


channel . 


For  coding  capacity  when  R^  /  R^.  one  again  obtains  a  rather  complicated 


expression  for  the  capacity.  In  [5],  a  solution  is  given  for  capacity  under 
the  assumption  that  S  has  a  pure  point  spectrum.  The  solution  is  a  function  of 
the  limit  points  of  the  spectrum  of  S  and  of  their  "relative  importance. "  For 
the  memoryless  channel,  where  R^  is  diagonal,  this  "relative  importance"  can 

be  roughly  described  as  the  relative  frequency  of  each  limit  point. 

In  the  case  where  the  spectrum  of  S  has  a  single  limit  point,  9,  and  S 
has  no  eigenvalues  strictly  less  than  0,  one  obtains  a  result  analogous  to 
that  of  the  matched  channel  (R^  =  R^) :  the  information  capacity  is  equal  to 


p  p 

2  ,  and  the  coding  capacity  is  equal  to  2  log[l  +  "j+0\] 


In  the  analogous  problems  for  the  time-continuous  channel,  the  constraint 

2 

on  the  code  words  is  given  by  llxl!^  j  <  PT,  where  x  is  required  to  belong  to 


L^CO.T],  and  IMI^  is  the  RKHS  norm  of  R^  j.  R^  j  is  obtained  from  a  covar¬ 
iance  function  r^,  defined  on  [0,°°)x[0,u:)  ,  and  R^  is  the  integral  operator 
defined  by  the  restriction  of  r^  to  [0,T]x[0,T].  In  this  case,  assuming  that 
range(R^  ^.)  is  infinite-dimensional  for  some  T  >  0,  the  coding  capacity  when 
R^  j  =  Rjy  j  for  all  T  >0  is  given  by  P/2.  When  R^  j  /  R^  with 


j  =  R^  ^.(I^.+S^.)R^  .j.,  I.j.  the  identity  in  l^CO.T],  then  the  coding  capacity 


depends  on  the  behavior  of  {9y.  T  >  0}  and  n  >  1,  T  >  0} ,  where  0^  is  the 
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smallest  limit  point  of  the  spectrum  of  and  {X^,  n  >  1}  is  the  set  of 

T 

eigenvalues  of  S^.  that  are  strictly  less  than  0^.  If  {X^,  n  >  1}  is  empty  for 

p 

all  sufficiently  large  T,  then  the  coding  capacity  is  \  - — -r-  ,  where 

p 

0  =  lim  0_.  However,  in  general  f  TTq  is  only  a  lower  bound  for  the  coding 
T-*° 

capaci ty 

Thus,  the  results  for  coding  capacity  and  for  information  capacity  of  the 
mismatched  channel  (R^  ^  R^)  both  differ  significantly  from  the  corresponding 


results  for  the  matched  channel.  For  further  details,  reference  is  made  to 
[1],  [3],  and  [5], 

All  of  the  above  discussion  is  for  the  additive  Gaussian  channel  without 
feedback.  In  the  case  of  channels  with  causal  feedback,  the  solutions  for 
information  capacity  and  for  coding  capacity  have  not  been  obtained  in  the 
case  of  the  mismatched  channel.  For  the  matched  channel,  information  capacity 
when  N  is  the  Wiener  process  has  been  obtained  [8],  and  this  has  been  extended 
to  obtain  capacity  for  some  more  general  Gaussian  processes  [7],  In  both 
cases,  it  has  been  found  that  causal  feedback  does  not  increase  capacity.  A 
solution  has  not  been  published  for  the  general  additive  Gaussian  channel, 
even  for  the  matched  case  (R^  =  R^). 


Feedback  Capacity 

Information  capacity  of  the  mismatched  Gaussian  channel  with  feedback  is 
an  open  problem.  It  has  long  been  speculated  that  causal  feedback  can  increase 
capacity  over  the  no-feedback  situation.  An  answer  will  be  given  here  to  these 
questions  for  the  discrete-time  finite-dimensional  channel:  processes  take 
values  in  IR  .  These  results  and  other  results  for  infinite-dimensional  chan¬ 
nels  were  announced  at  the  1986  IEEE  Symposium  on  Information  Theory  [2]. 

The  channel  output  isY=X-BY+N,  where  X  is  the  message  process,  N 
is  Gaussian  noise  independent  of  the  message,  and  B  is  a  str ict ly-lower-tr ian- 
gular  (SLT)  matrix  (b.^  =  0  for  j  >  i).  The  transmitted  signal  is  X  -  BY.  All 

processes  are  defined  on  a  probability  space  (f2,/3,fj),  and  8  will  be  used  to 
denote  expectation  with  respect  to  p.  The  capacity  problem  is  the  following: 
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V  J 


maximize  I[X,  Y] 


subject  to  SIIX  -  BY II2  <  P. 

2  2 

where  11*11  is  the  norm  for  a  K-dimensional  Euclidean  space:  IIXII  =  2  X.. 

i=l  1 

I[X,  Y]  denotes  mutual  information  of  X  and  Y.  See  [1]  for  definitions. 
Let 


C  (P)  =  sup  I[X.  X-BY+N) 

F 

C(P)  =  sup  I[X,  X+N] 

F1 

where  F  =  {(X.B):  gllX  -  BYI12  <  P,  Y  =  X  -  BY  +  N,  B  SLT} 

F1  =  {X:  SIIXII2  <  P}. 

An  "elementary  vector"  in  IR  is  a  vector  x  such  that  =  1,  x.  =  0  for 
i  ^  k,  some  k  in  {1,2 . K} . 

The  main  results  of  this  section  are  contained  in  the  following  theorem. 


THEOREM.  Cp(P)  >  C(P)  for  all  P  >  0  if  the  eigenmanifold  for  the  smallest 
eigenvalue  of  does  not  have  a  basis  consisting  entirely  of  elementary 
vectors  which  are  eigenvectors  of  R^. 

Cp(P)  >  C(P)  for  all  sufficiently  large  P  if  R^  is  not  a  diagonal  matrix. 

□ 

In  order  to  prove  the  result,  the  problem  will  first  be  reformulated  into 
an  equivalent  no-feedback  problem  involving  a  pure  power  constraint. 

Reformulation  of  the  Problem 

Y  =  X  -  BY  +  N;  since  B  is  SLT,  Y  =  (I+B)  *(X+N).  Moreover,  as  (I+B)  *  is 
1:1,  I[X,  Y]  =  I[X,  X+N],  The  constraint  is  5IIX  -  BYII2  <  P,  which  can  be 
written  as  £IIX  -  B(I+B)  *(X+N)!!2  <  P.  Since  B  is  SLT,  I  +  B  is  lower 

triangular,  so  (I+B)  *  is  lower  triangular  and  B(I+B)  1  is  again  SLT.  Given 
any  SLT  C,  there  exists  a  SLT  B  satisfying  C  =  B(I+B)  simply.  B  =  (I-C)  1C. 
The  original  feedback  problem  is  thus  equivalent  to  finding  sup  I[X,  X+N] 
subject  to  SIIX  -  C(X+N)tl2  <  P,  where  C  is  any  SLT  matrix. 

Using  the  above,  attention  can  now  be  restricted  to  the  following 
problems . 


CUP)  =  sup  I[X.  X+N] 
F‘{P) 

C(P)  =  sup  I[X.  X+N] 

Fj(P) 


where  F'(P)  is  the  set  of  all  Gaussian  random  vectors  in  0<  such  that 

£IIX  -  B(X+N)II2  <  P  for  some  SLT  matrix  B,  and  F'(P)  is  the  set  of  all  Gaussian 


K  2 

random  vectors  in  IR  such  that  £11X11  <  P. 


Structure  of  the  Reformulated  Problem 


Let  H ( IR  ,4)  be  the  set  of  all  K-component  real  random  vectors  f  on  (Q,0) 
K  2  K 

such  that  £  2  f  (w)  <  ®.  H(IR  .4)  is  a  Hilbert  space  under  the  inner  product 
n=l  n 
K 


(f.g)  =  £  2  f  (<j)g  (0) .  Suppose  that  X  and  N  are  two  mutually  independent 

4  n=l 


zero-mean  Gaussian  (w.r.t.  4)  random  vectors'.  £X  (u)N  (u)  =  0  for  all  n.m  <  K 

nv  '  mv  ’  - 


Suppose  also  that  N  has  non-singular  covariance  matrix  R^.  Let  H_(X+N)  be  the 


set  of  all  random  vectors  f  in  H(1R  ,4)  having  the  form  f  =  B(X+N) ,  where  B  is 
an  SLT  matrix.  It  is  clear  that  H_(X+N)  is  a  linear  manifold.  It  is  also 


closed  in  H(IR  ,4)  norm  since 


IIBn(X+N)  -  Bm(X+N)ll2  =  ll(Bn’-Bm)(X+N)ll2 


=  Trace  (Bn-Bm) (R^R^  (Bn-Bm)*  >  t0  Tr  (Bn-Bm) (Bn-Bm)'*. 


where  is  the  minimum  eigenvalue  of  R^. 


Thus,  if  (Bn(X+N))  is  Cauchy  in  H(1R^,4),  then  Tr  (Bn-Bm) (Bn-Bm)H  -*0. 

K  n  m  2 

This  is  equivalent  to  2  (Bn.-Bm.)  -*  0.  Hence  (Bn.)  must  be  Cauchy  for  each 

j  j=1  1J  1J 


j ,  and  so  the  limit  exists  as  an  SLT  matrix  B. 

Now  let  N  be  a  fixed  Gaussian  vector.  For  any  Gaussian  vector  X  indepen¬ 
dent  of  N,  let  P_X  be  the  projection  of  X  onto  H_(X+N).  The  feedback  problem 


is  now  to  choose  a  Gaussian  vector  X  so  that  I[X,  X+N]  is  maximized,  while 


IIX  -  P  XII  <  P. 
“  4  “ 


That  is,  if  one  chooses  any  Gaussian  vector  X  with  SLT  feedback  matrix  B, 


such  that  £  IIX  -  B(X+N)II2  <  P.  then  necessarily  £IIX  -  B(X+N)II2  >  IIX  -  P_XII2. 


and  since  P_X  =  C(X+N)  for  some  SLT  matrix  C  (since  H_(X+N)  is  closed)  one  can 

replace  B  with  C  and  be  assured  that  the  constraint  is  still  satisfied. 

It  can  be  seen  from  the  above  that  Cp(P)  >  C(P)  if  the  optimum  solution  X 

for  the  no-feedback  message  is  not  orthogonal  to  H  (X+N).  In  fact,  if  this 

condition  is  satisfied,  then  for  the  optimum  no-feedback  message  X,  and  a  /  0, 
SllaX  -  B(aX+N)ll2  <  P  gives  a2g|IXII2  <  P  +  A.  where  A  =  Tr  BO^  +  R^]B*  = 

a^r  BR^,  and  B(aX  +  N)  is  the  projection  of  aX  onto  H_(aX+N).  Since  gllXII2  =  P 

2  2 

for  the  optimum  no-feedback  message  X.  setting  a  5 11X11  =  P  +  A  gives 

2  2 

a  =  1  +  A/P,  so  that  a  >  1  whenever  A  >  0.  Thus,  one  can  replace  X  in  the  no 
feedback  problem  with  aX.  use  the  upper  bound  P  +  A  in  place  of  P,  and  obtain 
a  strict  increase  in  capacity.  Of  course.  A  depends  on  a. 

The  above  requires  that  the  optimum  no-feedback  message  X  not  be  ortho¬ 
gonal  to  H_(X+N).  Since  X  is  independent  of  N,  this  orthogonality  condition 

occurs  if  and  only  if  X  is  such  that  for  all  non-zero  SLT  matrices  B, 

Tr  BR^,  /  0. 

PROPOSITION.  Tr  BR^,  =  0  for  every  SLT  matrix  B  if  and  only  if  R^  is  diagonal. 


Proof.  Since  (BB^ii  =  2  B  j  ( j  i ) .  it  is  clear  that  Tr  BR^  =  0  for  every  SLT 


matrix  B  if  R^,  is  diagonal.  Now  suppose  that  Tr  BR^.  =  0  for  all  SLT  matrices 

B.  For  any  i.j  <  K  such  that  i  >  j,  choose  the  matrix  B  to  be  zero  except  for 
the  ij  component;  then  Trace  BR^  =  b.^.R^(ji)  =  0,  so  that  R^  ( j  i )  =0.  As  R^  is 


symmetric,  this  shows  that  the  condition  Tr  BRV  =  0  for  all  SLT  matrices  B 


implies  R^  is  diagonal. 

This  development  shows  that  feedback  can  increase  capacity  if  the  optimum 
no-feedback  message  X  does  not  have  uncorrelated  components.  From  [3,  Theorem 
1],  the  optimum  no-feedback  signal  covariance  is  given  by 


RX  =  J 


J 

2 

Li  =  l 


P.  +  P 


J  * 

2  u  u 
j  ,  n  n 
Jn=l 


J 

-  2  P  u  u  , 
,  m  m  m 
m=l 


where  (u  ,  n  <  K}  are  on.  eigenvectors  of  R..  corresponding  to  the  increasing 


sequence  ot'  eigenvalues  (1+/3  ),  and  J  (  K  is  the  largest  integer  such  that 

1  p  1  * 

P  ♦  I,  ,/3.  >  JPj.  For  ail  sufficiently  small  P,  this  gives  Rv  =  r-  ?,  ,u  u  . 

1=1  l  '  J  XL  n- 1  n  n 

where  L  is  the  multiplicity  of  1  +  j3  ^  as  an  eigenvalue  of  .  R^  wil!  ’hen  not 
be  diagonal  if  (u^ .  i  <  L}  cannot  be  taken  to  consist  of  elementary  vectors 
If  R^  is  defined  as  above  for  J  >  L,  then  this  property  will  again  prevent  R^ 
from  being  diagonal,  since  a  diagonal  must  have  the  K  elementary  vectors  as 
a  c.o.n.  set  of  eigenvectors.  This  shows  that  Cp(P)  >  C(P)  for  all  P  >  0  if 
the  restriction  of  to  the  eigenmani f old  of  1  +  P^  (as  an  eigenvalue  of  R  ) 
is  not  diagonal.  Further,  for  all  sufficiently  large  P, 

8  -  it  [.Vi  ’  p  ’  K)'  -  V 

4  =  1 

This  matrix  is  obviously  non-diagonal  if  R^  is  non-diagonal.  These  observa¬ 
tions  complete  the  proof  of  the  Theorem.  ^ 

The  above  results  give  sufficient  conditions  for  feedback  to  increase 
information  capacity.  It  can  be  seen  that  the  requirement  that  R^  not  be 

diagonal  is  also  a  necessary  condition  if  feedback  is  to  increase  capacity  for 
some  value  of  P,  without  assuming  linear  feedback. 
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